Distant Multi-speaker Voice Activity Detection Using Relative Energy Ratio
نویسندگان
چکیده
While single-speaker voice activity detection is a well-studied problem, multi-speaker voice activity detection (MSVAD) for distant speech recognition remains a challenging task. In this work, we propose a new MSVAD system for identifying voice activity of an individual speaker from distant speech data captured with a microphone array. In contrast to normal energy-based approaches, our MSVAD algorithm employs information from the interfering channels in a hierarchical manner in order to adaptively adjust the threshold. We demonstrate the effectiveness of our MSVAD algorithm through experiments on the Speech Separation Challenge corpus [1]. A MSVAD technique with the cross-meeting normalized energy criterion [2] provided a missed detection rate (MDR) of 7.4% with a false alarm rate (FAR) of 28.0%. By incorporating the proposed criterion in the algorithm, the MDR and FAR were further reduced to 4.3% and 19.6%, respectively. Our algorithm also achieved speech recognition performance comparable to manual segmentation results. Moreover, our method requires no parameter training and has low computational complexity.
منابع مشابه
Comparison of Voice Activity Detectors for Interview Speech in NIST Speaker Recognition Evaluation
Interview speech has become an important part of the NIST Speaker Recognition Evaluations (SREs). Unlike telephone speech, interview speech has substantially lower signal-to-noise ratio, which necessitates robust voice activity detection (VAD). This paper highlights the characteristics of interview speech files in NIST SREs and discusses the difficulties in performing speech/nonspeech segmentat...
متن کاملStudy of Overlapped Speech Detection for NIST SRE Summed Channel Speaker Recognition
This paper studies the overlapped speech detection for improving the performance of the summed channel speaker recognition system in NIST Speaker Recognition Evaluation (SRE). The speaker recognition system includes four main modules: voice activity detection, speaker diarization, overlapped speaker detection and speaker recognition. We adopt a GMM based overlapped speaker detection system, by ...
متن کاملA study of voice activity detection techniques for NIST speaker recognition evaluations
Since 2008, interview-style speech has become an important part of the NIST Speaker Recognition Evaluations (SREs). Unlike telephone speech, interview speech has lower signal-to-noise ratio, which necessitates robust voice activity detectors (VADs). This paper highlights the characteristics of interview speech files in NIST SREs and discusses the difficulties in performing speech/non-speech seg...
متن کاملA New Algorithm for Voice Activity Detection Based on Wavelet Packets (RESEARCH NOTE)
Speech constitutes much of the communicated information; most other perceived audio signals do not carry nearly as much information. Indeed, much of the non-speech signals maybe classified as ‘noise’ in human communication. The process of separating conversational speech and noise is termed voice activity detection (VAD). This paper describes a new approach to VAD which is based on the Wavelet ...
متن کاملRobust voice activity detection using perceptual wavelet-packet transform and Teager energy operator
In this letter, a robust voice activity detection (VAD) algorithm is presented. This proposed VAD algorithm makes use of the perceptual wavelet-packet transform and the Teager energy operator to compute a robust parameter called voice activity shape for VAD. The main advantage of this algorithm is that the preset threshold values or a priori knowledge of the SNR usually needed in conventional V...
متن کامل